What is Drone Navigation? Drone navigation is the process of autonomously controlling drones to navigate and fly in different environments.
Papers and Code
Aug 29, 2025
Abstract:Natural Language-Guided Drones (NLGD) provide a novel paradigm for tasks such as target matching and navigation. However, the wide field of view and complex compositional semantics in drone scenarios pose challenges for vision-language understanding. Mainstream Vision-Language Models (VLMs) emphasize global alignment while lacking fine-grained semantics, and existing hierarchical methods depend on precise entity partitioning and strict containment, limiting effectiveness in dynamic environments. To address this, we propose the Hierarchical Cross-Granularity Contrastive and Matching learning (HCCM) framework with two components: (1) Region-Global Image-Text Contrastive Learning (RG-ITC), which avoids precise scene partitioning and captures hierarchical local-to-global semantics by contrasting local visual regions with global text and vice versa; (2) Region-Global Image-Text Matching (RG-ITM), which dispenses with rigid constraints and instead evaluates local semantic consistency within global cross-modal representations, enhancing compositional reasoning. Moreover, drone text descriptions are often incomplete or ambiguous, destabilizing alignment. HCCM introduces a Momentum Contrast and Distillation (MCD) mechanism to improve robustness. Experiments on GeoText-1652 show HCCM achieves state-of-the-art Recall@1 of 28.8% (image retrieval) and 14.7% (text retrieval). On the unseen ERA dataset, HCCM demonstrates strong zero-shot generalization with 39.93% mean recall (mR), outperforming fine-tuned baselines.
* Accepted by ACM MM'25
Via

Aug 27, 2025
Abstract:Cross-view geo-localization is a critical task for UAV navigation, event detection, and aerial surveying, as it enables matching between drone-captured and satellite imagery. Most existing approaches embed multi-modal data into a joint feature space to maximize the similarity of paired images. However, these methods typically assume perfect alignment of image pairs during training, which rarely holds true in real-world scenarios. In practice, factors such as urban canyon effects, electromagnetic interference, and adverse weather frequently induce GPS drift, resulting in systematic alignment shifts where only partial correspondences exist between pairs. Despite its prevalence, this source of noisy correspondence has received limited attention in current research. In this paper, we formally introduce and address the Noisy Correspondence on Cross-View Geo-Localization (NC-CVGL) problem, aiming to bridge the gap between idealized benchmarks and practical applications. To this end, we propose PAUL (Partition and Augmentation by Uncertainty Learning), a novel framework that partitions and augments training data based on estimated data uncertainty through uncertainty-aware co-augmentation and evidential co-training. Specifically, PAUL selectively augments regions with high correspondence confidence and utilizes uncertainty estimation to refine feature learning, effectively suppressing noise from misaligned pairs. Distinct from traditional filtering or label correction, PAUL leverages both data uncertainty and loss discrepancy for targeted partitioning and augmentation, thus providing robust supervision for noisy samples. Comprehensive experiments validate the effectiveness of individual components in PAUL,which consistently achieves superior performance over other competitive noisy-correspondence-driven methods in various noise ratios.
* 10 pages
Via

Aug 17, 2025
Abstract:Image-goal navigation (ImageNav) tasks a robot with autonomously exploring an unknown environment and reaching a location that visually matches a given target image. While prior works primarily study ImageNav for ground robots, enabling this capability for autonomous drones is substantially more challenging due to their need for high-frequency feedback control and global localization for stable flight. In this paper, we propose a novel sim-to-real framework that leverages visual reinforcement learning (RL) to achieve ImageNav for drones. To enhance visual representation ability, our approach trains the vision backbone with auxiliary tasks, including image perturbations and future transition prediction, which results in more effective policy training. The proposed algorithm enables end-to-end ImageNav with direct velocity control, eliminating the need for external localization. Furthermore, we integrate a depth-based safety module for real-time obstacle avoidance, allowing the drone to safely navigate in cluttered environments. Unlike most existing drone navigation methods that focus solely on reference tracking or obstacle avoidance, our framework supports comprehensive navigation behaviors--autonomous exploration, obstacle avoidance, and image-goal seeking--without requiring explicit global mapping. Code and model checkpoints will be released upon acceptance.
Via

Aug 11, 2025
Abstract:This paper introduces an advanced AI-driven perception system for autonomous quadcopter navigation in GPS-denied indoor environments. The proposed framework leverages cloud computing to offload computationally intensive tasks and incorporates a custom-designed printed circuit board (PCB) for efficient sensor data acquisition, enabling robust navigation in confined spaces. The system integrates YOLOv11 for object detection, Depth Anything V2 for monocular depth estimation, a PCB equipped with Time-of-Flight (ToF) sensors and an Inertial Measurement Unit (IMU), and a cloud-based Large Language Model (LLM) for context-aware decision-making. A virtual safety envelope, enforced by calibrated sensor offsets, ensures collision avoidance, while a multithreaded architecture achieves low-latency processing. Enhanced spatial awareness is facilitated by 3D bounding box estimation with Kalman filtering. Experimental results in an indoor testbed demonstrate strong performance, with object detection achieving a mean Average Precision (mAP50) of 0.6, depth estimation Mean Absolute Error (MAE) of 7.2 cm, only 16 safety envelope breaches across 42 trials over approximately 11 minutes, and end-to-end system latency below 1 second. This cloud-supported, high-intelligence framework serves as an auxiliary perception and navigation system, complementing state-of-the-art drone autonomy for GPS-denied confined spaces.
Via

Jul 24, 2025
Abstract:Interacting with real-world objects in Mixed Reality (MR) often proves difficult when they are crowded, distant, or partially occluded, hindering straightforward selection and manipulation. We observe that these difficulties stem from performing interaction directly on physical objects, where input is tightly coupled to their physical constraints. Our key insight is to decouple interaction from these constraints by introducing proxies-abstract representations of real-world objects. We embody this concept in Reality Proxy, a system that seamlessly shifts interaction targets from physical objects to their proxies during selection. Beyond facilitating basic selection, Reality Proxy uses AI to enrich proxies with semantic attributes and hierarchical spatial relationships of their corresponding physical objects, enabling novel and previously cumbersome interactions in MR - such as skimming, attribute-based filtering, navigating nested groups, and complex multi object selections - all without requiring new gestures or menu systems. We demonstrate Reality Proxy's versatility across diverse scenarios, including office information retrieval, large-scale spatial navigation, and multi-drone control. An expert evaluation suggests the system's utility and usability, suggesting that proxy-based abstractions offer a powerful and generalizable interaction paradigm for future MR systems.
* 16 pages, 9 figures. Accepted for publication in UIST'25 (The 38th
Annual ACM Symposium on User Interface Software and Technology), Busan,
Republic of Korea, 28 Sep - 1 Oct 2025
Via

Jul 23, 2025
Abstract:The increasing demand for fast and cost effective last mile delivery solutions has catalyzed significant advancements in drone based logistics. This research describes the development of an AI integrated drone delivery system, focusing on route optimization, object detection, secure package handling, and real time tracking. The proposed system leverages YOLOv4 Tiny for object detection, the NEO 6M GPS module for navigation, and the A7670 SIM module for real time communication. A comparative analysis of lightweight AI models and hardware components is conducted to determine the optimal configuration for real time UAV based delivery. Key challenges including battery efficiency, regulatory compliance, and security considerations are addressed through the integration of machine learning techniques, IoT devices, and encryption protocols. Preliminary studies demonstrate improvement in delivery time compared to conventional ground based logistics, along with high accuracy recipient authentication through facial recognition. The study also discusses ethical implications and societal acceptance of drone deliveries, ensuring compliance with FAA, EASA and DGCA regulatory standards. Note: This paper presents the architecture, design, and preliminary simulation results of the proposed system. Experimental results, simulation benchmarks, and deployment statistics are currently being acquired. A comprehensive analysis will be included in the extended version of this work.
Via

Jul 23, 2025
Abstract:Autonomous drones have gained considerable attention for applications in real-world scenarios, such as search and rescue, inspection, and delivery. As their use becomes ever more pervasive in civilian applications, failure to ensure safe operation can lead to physical damage to the system, environmental pollution, and even loss of human life. Recent work has demonstrated that motion planning techniques effectively generate a collision-free trajectory during navigation. However, these methods, while creating the motion plans, do not inherently consider the safe operational region of the system, leading to potential safety constraints violation during deployment. In this paper, we propose a method that leverages run time safety assurance in a kinodynamic motion planning scheme to satisfy the system's operational constraints. First, we use a sampling-based geometric planner to determine a high-level collision-free path within a user-defined space. Second, we design a low-level safety assurance filter to provide safety guarantees to the control input of a Linear Quadratic Regulator (LQR) designed with the purpose of trajectory tracking. We demonstrate our proposed approach in a restricted 3D simulation environment using a model of the Crazyflie 2.0 drone.
* Accepted for publication at 2025 Modeling, Estimation and Control
Conference (MECC)
Via

Jul 10, 2025
Abstract:Accurate position estimation is essential for modern navigation systems deployed in autonomous platforms, including ground vehicles, marine vessels, and aerial drones. In this context, Visual Simultaneous Localisation and Mapping (VSLAM) - which includes Visual Odometry - relies heavily on the reliable extraction of salient feature points from the visual input data. In this work, we propose an embedded implementation of an unsupervised architecture capable of detecting and describing feature points. It is based on a quantised SuperPoint convolutional neural network. Our objective is to minimise the computational demands of the model while preserving high detection quality, thus facilitating efficient deployment on platforms with limited resources, such as mobile or embedded systems. We implemented the solution on an FPGA System-on-Chip (SoC) platform, specifically the AMD/Xilinx Zynq UltraScale+, where we evaluated the performance of Deep Learning Processing Units (DPUs) and we also used the Brevitas library and the FINN framework to perform model quantisation and hardware-aware optimisation. This allowed us to process 640 x 480 pixel images at up to 54 fps on an FPGA platform, outperforming state-of-the-art solutions in the field. We conducted experiments on the TUM dataset to demonstrate and discuss the impact of different quantisation techniques on the accuracy and performance of the model in a visual odometry task.
* Accepted for the DSD 2025 conference in Salerno, Italy
Via

Jul 16, 2025
Abstract:Sidewalk delivery robots are a promising solution for urban freight distribution, reducing congestion compared to trucks and providing a safer, higher-capacity alternative to drones. However, unreliable travel times on sidewalks due to pedestrian density, obstacles, and varying infrastructure conditions can significantly affect their efficiency. This study addresses the robust route planning problem for sidewalk robots, explicitly accounting for travel time uncertainty due to varying sidewalk conditions. Optimization is integrated with simulation to reproduce the effect of obstacles and pedestrian flows and generate realistic travel times. The study investigates three different approaches to derive uncertainty sets, including budgeted, ellipsoidal, and support vector clustering (SVC)-based methods, along with a distributionally robust method to solve the shortest path (SP) problem. A realistic case study reproducing pedestrian patterns in Stockholm's city center is used to evaluate the efficiency of robust routing across various robot designs and environmental conditions. The results show that, when compared to a conventional SP, robust routing significantly enhances operational reliability under variable sidewalk conditions. The Ellipsoidal and DRSP approaches outperform the other methods, yielding the most efficient paths in terms of average and worst-case delay. Sensitivity analyses reveal that robust approaches consistently outperform the conventional SP, particularly for sidewalk delivery robots that are wider, slower, and have more conservative navigation behaviors. These benefits are even more pronounced in adverse weather conditions and high pedestrian congestion scenarios.
Via

Jun 17, 2025
Abstract:Autonomous navigation by drones using onboard sensors combined with machine learning and computer vision algorithms is impacting a number of domains, including agriculture, logistics, and disaster management. In this paper, we examine the use of drones for assisting visually impaired people (VIPs) in navigating through outdoor urban environments. Specifically, we present a perception-based path planning system for local planning around the neighborhood of the VIP, integrated with a global planner based on GPS and maps for coarse planning. We represent the problem using a geometric formulation and propose a multi DNN based framework for obstacle avoidance of the UAV as well as the VIP. Our evaluations conducted on a drone human system in a university campus environment verifies the feasibility of our algorithms in three scenarios; when the VIP walks on a footpath, near parked vehicles, and in a crowded street.
* 16 pages, 7 figures; Accepted as Late-Breaking Results at the
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
2023
Via
